Students at first-level, schools at second-level, districts at third-level
Can also refer to cients in therapists, classrooms in schools, employees in organization, people in family, measurements in person, citizens in country
What is another name for multilevel models?
In a data structure where data is collected for people multiple times, what is the level-1 unit? What is the level-2 unit?
Mixed-effects models
Measurement occassions; People
Let’s load a dataset from Ch. 3 of Heck, R. H., Thomas, S. L., & Tabata, L. N. (2011). Multilevel and Longitudinal Modeling with IBM SPSS: Taylor & Francis.
Datasets can be pretty large for multilevel data so I generally favor more efficient data reading functions like read_csv or fread
Let’s inspect the data. One thing that is immediately apparent is that “schcode” for school is repeated for multiple observations whereas each “id” for student is distinct/unique. This shows that there are repeated observations of students within each school.
schcode Rid id female ses femses math ses_mean pro4yrc public
1: 1 1 6701 1 0.586 0.586 47.1400 -0.2667500 0.08333333 0
2: 1 2 6702 1 0.304 0.304 63.6100 -0.2667500 0.08333333 0
3: 1 3 6703 1 -0.544 -0.544 57.7100 -0.2667500 0.08333333 0
4: 1 4 6704 0 -0.848 0.000 53.9000 -0.2667500 0.08333333 0
5: 1 5 6705 0 0.001 0.000 58.0100 -0.2667500 0.08333333 0
6: 1 6 6706 0 -0.106 0.000 59.8700 -0.2667500 0.08333333 0
7: 1 7 6707 0 -0.330 0.000 62.5556 -0.2667500 0.08333333 0
8: 1 8 6708 1 -0.891 -0.891 47.0100 -0.2667500 0.08333333 0
9: 1 9 6709 0 0.207 0.000 72.4200 -0.2667500 0.08333333 0
10: 1 10 6710 1 -0.341 -0.341 65.8400 -0.2667500 0.08333333 0
11: 1 11 6711 0 -0.171 0.000 57.3400 -0.2667500 0.08333333 0
12: 1 12 6712 1 -1.068 -1.068 62.5556 -0.2667500 0.08333333 0
13: 2 1 3703 0 -0.105 0.000 61.9528 0.6799231 1.00000000 0
14: 2 2 3704 0 1.280 0.000 70.2200 0.6799231 1.00000000 0
15: 2 3 3705 0 1.056 0.000 58.7800 0.6799231 1.00000000 0
16: 2 4 3706 0 0.800 0.000 65.5400 0.6799231 1.00000000 0
17: 2 5 3707 0 0.733 0.000 59.7700 0.6799231 1.00000000 0
18: 2 6 3708 0 0.130 0.000 64.0700 0.6799231 1.00000000 0
19: 2 7 3709 0 0.682 0.000 61.9528 0.6799231 1.00000000 0
20: 2 8 3710 0 0.917 0.000 66.8300 0.6799231 1.00000000 0
21: 2 9 3711 0 0.810 0.000 69.4100 0.6799231 1.00000000 0
22: 2 10 3712 0 0.518 0.000 56.5100 0.6799231 1.00000000 0
23: 2 11 3713 0 0.464 0.000 57.2200 0.6799231 1.00000000 0
24: 2 12 3714 0 0.552 0.000 64.8900 0.6799231 1.00000000 0
25: 2 13 3715 0 1.002 0.000 70.1200 0.6799231 1.00000000 0
26: 3 1 861 1 -0.302 -0.302 48.7600 -0.5480000 0.33333333 0
27: 3 2 862 1 -0.648 -0.648 44.6500 -0.5480000 0.33333333 0
28: 3 3 863 1 -0.448 -0.448 60.5929 -0.5480000 0.33333333 0
29: 3 4 864 1 -0.096 -0.096 44.9900 -0.5480000 0.33333333 0
30: 3 5 865 1 -0.291 -0.291 50.9000 -0.5480000 0.33333333 0
31: 3 6 866 0 -1.650 0.000 42.9400 -0.5480000 0.33333333 0
32: 3 7 867 0 -1.125 0.000 50.6700 -0.5480000 0.33333333 0
33: 3 8 868 1 -1.872 -1.872 39.6700 -0.5480000 0.33333333 0
34: 3 9 869 1 -0.602 -0.602 37.8000 -0.5480000 0.33333333 0
35: 3 10 870 0 -0.922 0.000 44.1000 -0.5480000 0.33333333 0
36: 3 11 871 1 0.228 0.228 57.2300 -0.5480000 0.33333333 0
37: 3 12 872 1 0.401 0.401 40.9700 -0.5480000 0.33333333 0
38: 3 13 873 0 -0.119 0.000 37.6200 -0.5480000 0.33333333 0
39: 3 14 874 0 -0.450 0.000 60.5929 -0.5480000 0.33333333 0
40: 3 15 875 1 -0.246 -0.246 52.2500 -0.5480000 0.33333333 0
41: 3 16 876 0 -0.868 0.000 60.5929 -0.5480000 0.33333333 0
42: 3 17 877 1 -0.986 -0.986 41.6000 -0.5480000 0.33333333 0
43: 3 18 878 0 0.132 0.000 40.3400 -0.5480000 0.33333333 0
44: 4 1 6799 1 0.966 0.966 62.7079 0.8159412 1.00000000 0
45: 4 2 6800 1 0.823 0.823 62.7079 0.8159412 1.00000000 0
46: 4 3 6801 1 -0.212 -0.212 62.7079 0.8159412 1.00000000 0
47: 4 4 6802 1 0.895 0.895 62.7079 0.8159412 1.00000000 0
48: 4 5 6803 1 1.065 1.065 69.0400 0.8159412 1.00000000 0
49: 4 6 6804 1 1.251 1.251 78.9800 0.8159412 1.00000000 0
50: 4 7 6805 1 0.761 0.761 62.7079 0.8159412 1.00000000 0
schcode Rid id female ses femses math ses_mean pro4yrc public
Assumes residuals are uncorrelated
Observations are assumed as independent
Effects are independent
Formula: \(math_{ij}=β_{0}+β_{1}X_{i}+\epsilon_{i}\)
Call:
lm(formula = math ~ ses, data = df)
Residuals:
Min 1Q Median 3Q Max
-31.459 -4.678 1.144 5.355 47.560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.59817 0.09819 586.61 <2e-16 ***
ses 4.25468 0.12566 33.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.132 on 6869 degrees of freedom
Multiple R-squared: 0.143, Adjusted R-squared: 0.1429
F-statistic: 1146 on 1 and 6869 DF, p-value: < 2.2e-16
So those t-statistics and p-values suggest that SES has an effect that should not be observed given that the null is true, right? Let’s see if the effect differs between girls and boys.
Call:
lm(formula = math ~ ses * female, data = df)
Residuals:
Min 1Q Median 3Q Max
-30.975 -4.596 1.172 5.288 48.265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.1441 0.1393 417.506 < 2e-16 ***
ses 4.0533 0.1775 22.840 < 2e-16 ***
female -1.0737 0.1961 -5.475 4.54e-08 ***
ses:female 0.3472 0.2510 1.383 0.167
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.115 on 6867 degrees of freedom
Multiple R-squared: 0.1469, Adjusted R-squared: 0.1465
F-statistic: 394.2 on 3 and 6867 DF, p-value: < 2.2e-16
There’s not much of a difference in the effect of SES between boys and girls.
Can you identify an issues with modeling this data using the previous approach?
We treat all schools as if they have the same slopes and intercepts, which surely all schools do not have the same average math achievement score or effect of SES on math achievement, right?
This is the problem of non-independence: Some schools may have higher/lower math achievements, or the association of gender with math achievement may differ between schools.
We can use a neat package called performance to inspect the GLM assumptions, as well.
“a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined.”
Clustering of individual data points nested within higher units. OLS regression assumes observations are independently distributed
Ignoring hierarchical structure and examining data as disaggregated will lead to standard errors being underestimated, overestimation of statistical significance.
If I know Timmy’s reading score and know that Tommy is in the same school as Timmy, I already know a little bit about Tommy’s score too
The unique contribution of each student is reduced because some information is shared by knowing which school they are in
Inflated risk of Type 1 Error! We think SE is smaller than it is if we assume observations are independent and that it’s 100 cases, instead of 10 cases of 10…
\(t = \frac{b}{SE}\)
\(SE=\frac{\sigma}{\sqrt{N}}\)
What does non-independence refer to and why is it a problem?
| Fixed Effects | Random Effects |
|---|---|
|
|
What is “mixed” in a mixed model?
How are fixed and random effects different?
Do you expect there to be variation in the intercept (i.e., average) between schools?
Do you expect there to be variation in the effect of X on Y (i.e., slopes) between schools?
“Random effects” == “by-school varying intercepts and slopes”
Do we expect the schools to have different averages in math achievement? Then, we should model a random-intercept model.
Do we expect the schools to differ in their effects of gender on math achievement? Then, we should model a random-slope model.
When or why do we need a random intercept? When or why do we need a random slope?
lme4 is likely the most popular and widely used package for mixed-effects modeling in R. lmerTest estimates the degrees of freedom for each parameter in order to conduct hypothesis/significance testing.
We can begin with a null model, sometimes called the intercept-only model. In this model, we simply model the DV with no regressors. We’re estimating residual variance for Level-1 and Level-2, but no predictors.
Level 1: \(math_{ij}=β_{0j}+R_{ij}\)
Level 2: \(β_{0j}=γ_{00}+U_{0j}\)
Combined: \(math_{ij}=γ_{00}+U_{0j}+R_{ij}\)
The elements in the parentheses represent the random effects. On right side of the vertical line is the random factor(s) (i.e. schools). On the left side are the random slopes for the fixed effects (i.e. random variation in the effect of each subject’s slope). Outside of parentheses are the fixed effects. This is a null model so you have a 1 to denote no fixed effects besides an intercept.
The random intercept allows us to account for variability in participant averages.
If you see above under random effects, you’ll notice that there are random effects statistics reported for both schools and residuals. You can understand these random effects as reflecting the between-school variance and the school-subjects variance respectively.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + (1 | schcode)
Data: df
REML criterion at convergence: 48877.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.6336 -0.5732 0.1921 0.6115 5.2989
Random effects:
Groups Name Variance Std.Dev.
schcode (Intercept) 10.64 3.262
Residual 66.55 8.158
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.6742 0.1883 416.0655 306.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ICCs can tell us if a mixed model is justified. It is the quotient of the variance between clusters to the total variance. Can be interpreted as (1) the proportion of variance in the outcome that is attributable to clusters or (2) the expected correlation between the outcome from randomly selected units from a randomly selected cluster.
Level 2 Variance (i.e. Residual Variance) / (Total Variance i.e. [Random Intercept Variance + Residual Variance])
\(\frac{\tau^2_{0} }{\tau^2_{0} + \sigma^2}\)
What does an intraclass correlation coefficient tell us (in the context of MLMs)?
We can move to the next step and model the effect of student-level SES on student-level math achievement, allowing the intercept to vary between schools. This means that each school is allowed to have a different mean for math achievement for SES = 0.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode)
Data: df
REML criterion at convergence: 48215.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.7733 -0.5540 0.1303 0.6469 5.6908
Random effects:
Groups Name Variance Std.Dev.
schcode (Intercept) 3.469 1.863
Residual 62.807 7.925
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.5960 0.1329 375.6989 433.36 <2e-16 ***
ses 3.8739 0.1366 3914.6382 28.35 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ses -0.025
We can perform a Likelihood Ratio Test (LRT) to test whether a model with an effect outperforms a model without an effect, otherwise called “nested models”. This compares the likelihood of the data under each model, hence the name. A small p-value, or higher Chi-Square, lower deviance, BIC, or AIC will all reflect that the full model performs better than the simpler model.
Data: df
Models:
GLMM.Null: math ~ 1 + (1 | schcode)
GLMM.L1SES: math ~ ses + (1 | schcode)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
GLMM.Null 3 48882 48902 -24438 48876
GLMM.L1SES 4 48219 48246 -24106 48211 664.66 1 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If we divide the difference between our null model’s level-1 variance (i.e., residual variance) and this new model’s (l1) level-1 variance by the null model variance, we can see what proportion of variance was reduced.
Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)
Level 2: \(β_{0j}=γ_{00}+U_{0j}\) & \(β_{1j}=γ_{10}\)
Combined: \(math_{ij}=γ_{00}+γ_{10}SES_{ij}+U_{0j}+R_{ij}\)
the fixed effect for the intercept, controlling for ses; \(γ_{00}\)
the fixed effect for the slope of ses; \(γ_{10}\)
a random effect for the intercept capturing the variance of schools around the intercept \(\tau^{2}_{0}\), controlling for ses. Every school has residuals around the intercept \(U_{0J}\)
a random effect capturing the variance of students around their school mean math achievement, controlling for ses. \(\sigma^2\)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode)
Data: df
REML criterion at convergence: 48215.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.7733 -0.5540 0.1303 0.6469 5.6908
Random effects:
Groups Name Variance Std.Dev.
schcode (Intercept) 3.469 1.863
Residual 62.807 7.925
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.5960 0.1329 375.6989 433.36 <2e-16 ***
ses 3.8739 0.1366 3914.6382 28.35 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ses -0.025
As shown in the earlier plot, it’s quite reasonable to assume that the effect of SES may vary between schools. Let’s test it:
Note that a 1 denotes an intercept, and there is an “implicit” or “default” 1 outside the parentheses and inside the parentheses. This is equivalent to the prior formula syntax but is more “explicit”.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + ses + (1 + ses | schcode)
Data: df
REML criterion at convergence: 48190.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.8578 -0.5553 0.1290 0.6437 5.7098
Random effects:
Groups Name Variance Std.Dev. Corr
schcode (Intercept) 3.2042 1.7900
ses 0.7794 0.8828 -1.00
Residual 62.5855 7.9111
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.6959 0.1315 378.6378 438.78 <2e-16 ***
ses 3.9602 0.1408 1450.7730 28.12 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ses -0.284
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
If you wanted to remove the intercept for your fixed effects and the intercept for your random effects (not recommended), you would replace these “default” 1s with 0s. You can see below that there is no longer an “(Intercept)” under random effects or fixed effects.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 0 + ses + (0 + ses | schcode)
Data: df
REML criterion at convergence: 74122.3
Scaled residuals:
Min 1Q Median 3Q Max
-0.8263 0.5688 0.9386 1.1965 2.3970
Random effects:
Groups Name Variance Std.Dev.
schcode ses 1244 35.27
Residual 2566 50.66
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
ses 3.882 1.943 422.105 1.998 0.0464 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fitting this model with a random slope for SES, you may notice the warning, “boundary (singular) fit: see help(‘isSingular’)”. This occurs when one of your random effects parameters (as seen in your VCOV matrix) is essentially near 0, which can be due to the parameter being actually approximately 0 or because of multicollinearity.
Let’s inspect the variance-covariance matrix and the random effects. The VCOV shows no variances near 0. But the random effects correlation shows a -1.0 correlation.
What should we inspect if we observe a “singular fit”?
We can attempt to fix the singular fit by removing this perfect correlation. Specifying the random intercept for schools (1 | schcode) and the random slope of SES for schools as separate (0 + SES | schcode) removes the correlation. Voila! The perfect correlation is gone and singularity is fixed!
GLMM.L1SESV <- lmer(math ~ ses + ( 1 | schcode) + (0 + ses | schcode), data = df)
summary(GLMM.L1SESV)Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode) + (0 + ses | schcode)
Data: df
REML criterion at convergence: 48213.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.7791 -0.5526 0.1327 0.6466 5.7089
Random effects:
Groups Name Variance Std.Dev.
schcode (Intercept) 3.3222 1.8227
schcode.1 ses 0.7205 0.8488
Residual 62.5213 7.9070
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.5888 0.1328 374.9738 433.55 <2e-16 ***
ses 3.8803 0.1435 377.2408 27.04 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ses -0.023
Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)
Level 2: \(β_{0j}=γ_{00}+γ_{01}public_{j}+U_{0j}\) & \(β_{1j}=γ_{10} + U_{1j}\)
Combined: \(math_{ij}=γ_{00}+γ_{10}public_{j}+γ_{10}SES_{ij}+U_{0j}+U_{1j}SES_{1j}+R_{ij}\)
the fixed effect for the intercept, controlling for ses; \(γ_{00}\)
the fixed effect for the slope of ses; \(γ_{10}\)
a random effect capturing the variance of students around their school’s mean math achievement, controlling for ses; \(\sigma^{2}\)
a random effect for the intercept capturing the variance of schools around the intercept, controlling for ses; \(\tau^{2}_{0}\)
a random effect for the slope capturing variance of school slopes around the grand mean slope, controlling for ses; \(\tau^{2}_{1}\)
a random effect covariance capturing how the intercept variance and slope variance relate to each other; \(\tau_{01}\)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode) + (0 + ses | schcode)
Data: df
REML criterion at convergence: 48213.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.7791 -0.5526 0.1327 0.6466 5.7089
Random effects:
Groups Name Variance Std.Dev.
schcode (Intercept) 3.3222 1.8227
schcode.1 ses 0.7205 0.8488
Residual 62.5213 7.9070
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.5888 0.1328 374.9738 433.55 <2e-16 ***
ses 3.8803 0.1435 377.2408 27.04 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
ses -0.023
In a multilevel model, we don’t have to only model predictors that vary across level-1 units, but can also model predictors that are the same across level-1 units but differ across level-2 (or level-3, or level-4 units). Here, that would be depicted as school-level units. Let’s focus on the categorical variable of whether the schools were public or private.
Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)
Level 2: \(β_{0j}=γ_{00}+γ_{01}public_{j}+U_{0j}\) & \(β_{1j}=γ_{10}+U_{1j}\)
Combined: \(math_{ij}=γ_{00}+γ_{10}public_{j}+γ_{10}SES_{ij}+U_{0j}+U_{1jSESj}+R_{ij}\)
Note: You may see \(public_{j}\) only predicts the level-2 intercept but not slope. This is intentional and only models a main effect of the level-2 predictor. Modeling the level-2 predictor for the level-2 slope would result in a cross-level interaction.
the fixed effect for the intercept, controlling for ses and public; \(γ_{00}\)
the fixed effect for the slope of public controlling for ses; \(γ_{01}\)
the fixed effect for the slope of ses controlling for public; \(γ_{10}\)
a random effect for the intercept capturing the variance of schools around the intercept, controlling for ses and public; \(\sigma^{2}\)
a random effect capturing the variance of students around their school mean math achievement, controlling for ses and public; \(\tau^{2}_{0}\)
a random effect for the slope capturing variance of school slopes around the grand mean slope, controlling for ses; \(\tau^{2}_{1}\)
a random effect covariance capturing how the intercept variance and slope variance relate to each other; \(\tau_{01}\)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + ses + public + (ses | schcode)
Data: df
REML criterion at convergence: 48190.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.8542 -0.5543 0.1299 0.6421 5.7122
Random effects:
Groups Name Variance Std.Dev. Corr
schcode (Intercept) 3.2147 1.793
ses 0.7904 0.889 -1.00
Residual 62.5840 7.911
Number of obs: 6871, groups: schcode, 419
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 57.8209 0.2469 395.1186 234.162 <2e-16 ***
ses 3.9631 0.1410 1433.2714 28.102 <2e-16 ***
public -0.1699 0.2855 391.3491 -0.595 0.552
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) ses
ses -0.122
public -0.846 -0.036
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
A factor is a variable.
An effect is the variable’s coefficient.
Full formula: \(math_{ij}=γ_{00}+γ_{1j}ses_{ij}+U_{0j}+U_{1j}ses_{ij}+R_{ij}\)
Where we have two fixed effects: \(γ_{00}\) and \(γ_{1j}\)
We have three random effects: random-intercept \(U_{0j}\), random-slope of SES \(U_{1j}ses_{ij}\) and level-1 variance \(R_{ij}\)
And one random factor: School, which is not seen directly in the formla but is denoted by subscript “i”. That’s because the fixed terms average over all the schools, but the random terms are per school.
Moving on…
What if we want to know how school type affects the slope of ses, though? In other words, is there a difference in the effect of SES on math achievement in a private or public school?
This is one of the biggest advantages of using MLMs/MEMs is that you can estimate cross-level interactions between variables at higher-level units and variables at lower-level units. This would be reflected by predicting SES slopes (L1: Math ~ SES) by school type (L2: SES slope ~ Public)
The parameters in this model are identical the random-intercepts and random-slopes model estimated prior, except with an additional parameter:
public on the slope of ses; \(γ_{11}\)
Why is the cross-level interaction modeled as the slope of SES predicted by public? Let’s do the math.
Level 1: \(math_{ij}=β_{0j}+β_{1j}ses_{ij}+R_{ij}\)
Level 2 Slope: \(β_{1j}=γ_{10}+γ_{11publicj}+U_{1j}\)
\(math_{ij}=β_{0j}+XXXXXses_{ij}+R_{ij}\)
\(math_{ij}=β_{0j}+(γ_{10}+γ_{11publicj}+U_{1j})ses_{ij}+R_{ij}\)
Combined: \(math_{ij}=β_{0j}+γ_{10}ses_{ij}+γ_{11}ses_{ij}*public_{j}+U_{1j}ses_{ij}+R_{ij}\)
Now, we have slopes predicted by the level-2 variable, which estimates a cross-level interaction.
Every lower-level unit is unique to each higher-level unit
e.g. Different classes for each school (or different students for each class)
Every lower-level unit appears in every higher-level unit.
e.g. Same classes for each school. (Intuitive psychology example: Same stimuli for each participant)
By-Participant and By-Stimuli Random-Intercepts and By-Participant Random-Slopes for \(X_{i}\) on reaction time
mod_sim <- lmer(RT ~ 1 + X_i + (1 | item_id) + (1 + X_i | subj_id),
data = sim_dat)
summary(mod_sim)Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: RT ~ 1 + X_i + (1 | item_id) + (1 + X_i | subj_id)
Data: sim_dat
REML criterion at convergence: 67724.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.7952 -0.6419 0.0011 0.6540 3.4639
Random effects:
Groups Name Variance Std.Dev. Corr
subj_id (Intercept) 12645 112.45
X_i 1260 35.49 0.37
item_id (Intercept) 5960 77.20
Residual 41024 202.54
Number of obs: 5000, groups: subj_id, 100; item_id, 50
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 816.65 15.93 123.63 51.256 < 2e-16 ***
X_i 82.77 22.85 50.21 3.622 0.00068 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
X_i 0.041
Pulls more extreme estimates towards an overall average
Information from all groups can be used to inform parameter estimates for each individual group
Gives more “weight” to groups with more data…
Conditional R2: takes both the fixed and random effects into account.
Marginal R2: considers only the variance of the fixed effects.
Sometimes optimizers cannot always find the combination of parameters that maximizes the likelihood of observing your data. When optimizers cannot find a solution, this is called non-convergence.
Number of iterations. If you increase the number of iterations, the algorithm will search for longer. This is the equivalent of getting our puzzle-doer to sit at the table for longer trying to assemble the puzzle, trying out different and more pieces.
Algorithm: the algorithm determines how the optimizer chooses its next attempted solution. What strategy is our puzzle-doer using to fit pieces into the puzzle?
Tolerance: this can get a bit technical and vary depending on context, so we suggest Brauer and Curtin, 2018 for more. But in our case, we can think of it as the algorithm’s tolerance for differences in solutions. Lower tolerance means slightly different solutions will be seen as different, whereas higher tolerance means two different solutions that are still kind of close will be treated as essentially the same. Maybe our puzzle-doer needs glasses; tolerance is like whether they’re wearing their glasses and can distinguish between two close-but-not-identical assembled puzzles.
diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")]
lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)$bobyqa
[1] "boundary (singular) fit: see help('isSingular')"
$Nelder_Mead
[1] "boundary (singular) fit: see help('isSingular')"
$nlminbwrap
[1] "boundary (singular) fit: see help('isSingular')"
$`optimx.L-BFGS-B`
[1] "boundary (singular) fit: see help('isSingular')"
$nloptwrap.NLOPT_LN_NELDERMEAD
[1] "boundary (singular) fit: see help('isSingular')"
$nloptwrap.NLOPT_LN_BOBYQA
[1] "boundary (singular) fit: see help('isSingular')"
In your lmer/glmer function, you would include an argument such as below to increase the number of iterations or algorithm.
https://humburg.github.io/Power-Analysis/simr_power_analysis.html
https://aguinis.shinyapps.io/ml_power/
https://koumurayama.shinyapps.io/summary_statistics_based_power/
http://mfviz.com/hierarchical-models/